15 research outputs found

    Two-stage index computation for bandits with switching penalties I : switching costs

    Get PDF
    This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n3n^{3}) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n2n^{2}+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.

    Two-stage index computation for bandits with switching penalties II : switching delays

    Get PDF
    This paper addresses the multi-armed bandit problem with switching penalties including both costs and delays, extending results of the companion paper [J. Niño-Mora. "Two-Stage Index Computation for Bandits with Switching Penalties I: Switching Costs". Conditionally accepted at INFORMS J. Comp.], which addressed the no switching delays case. Asawa and Teneketzis (1996) introduced an index for bandits with delays that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index", yet gave no algorithm for it. This paper presents an efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most (5/2)n3n^{3}+O(n) arithmetic operations for an n -state bandit. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching penalties in its semi- Markov restless reformulation, by deploying work-reward analysis and LP-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against a benchmark index policy across a wide instance range.

    Characterization and computation of restless bandit marginal productivity indices

    Get PDF
    The Whittle index [P. Whittle (1988). Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25A, 287-298] yields a practical scheduling rule for the versatile yet intractable multi-armed restless bandit problem, involving the optimal dynamic priority allocation to multiple stochastic projects, modeled as restless bandits, i.e., binary-action (active/passive) (semi-) Markov decision processes. A growing body of evidence shows that such a rule is nearly optimal in a wide variety of applications, which raises the need to efficiently compute the Whittle index and more general marginal productivity index (MPI) extensions in large-scale models. For such a purpose, this paper extends to restless bandits the parametric linear programming (LP) approach deployed in [J. Niño-Mora. A (2/3)n3n^{3} fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press], which yielded a fast Gittins-index algorithm. Yet the extension is not straightforward, as the MPI is only defined for the limited range of socalled indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in largescale applications, presenting the following contributions: (i) a complete algorithmic characterization of indexability, for which two block implementations are given; and (ii) more importantly, new analytical conditions for indexability — termed LP-indexability — that leverage knowledge on the structure of optimal policies in particular models, under which the MPI is computed faster by the adaptive-greedy algorithm previously introduced by the author under the more stringent PCL-indexability conditions, for which a new fast-pivoting block implementation is given. The paper further reports on a computational study, measuring the runtime performance of the algorithms, and assessing by a simulation study the high prevalence of indexability and PCL-indexability.

    Marginal productivity index policies for problems of admission control and routing to parallel queues with delay

    Get PDF
    In this paper we consider the problem of admission control of Bernoulli arrivals to a buffer with geometric server, in which the controller’s actions take effect one period after the actual change in the queue length. An optimal policy in terms of marginal productivity indices (MPI) is derived for this problem under the following three performance objectives: (i) minimization of the expected total discounted sum of holding costs and rejection costs, (ii) minimization of the expected time-average sum of holding costs and rejection costs, and (iii) maximization of the expected time-average number of job completions. Our employment of existing theoretical and algorithmic results on restless bandit indexation together with some new results yields a fast algorithm that computes the MPI for a queue with a buffer size of I performing only O(I) arithmetic operations. Such MPI values can be used both to immediately obtain the optimal thresholds for the admission control problem, and to design an index policy for the routing problem (with possible admission control) in the multi-queue system. Thus, this paper further addresses the problem of designing and computing a tractable heuristic policy for dynamic job admission control and/or routing in a discrete time Markovian model of parallel loss queues with one-period delayed state observation and/or action implementation, which comes close to optimizing an infinite-horizon problem under the above three objectives. Our approach seems to be tractable also for the analogous problems with larger delays and, more generally, for arbitrary restless bandits with delays.Admission control, Routing, Parallel queues, Delayed information, Delayed action implementation, Index policy, Restless bandits, Marginal productivity index

    An index for dynamic product promotion and the knapsack problem for perishable items

    Get PDF
    This paper introduces the knapsack problem for perishable items (KPPI), which concerns the optimal dynamic allocation of a limited promotion space to a collection of perishable items. Such a problem is motivated by applications in a variety of industries, where products have an associated lifetime after which they cannot be sold. The paper builds on recent developments on restless bandit indexation and gives an optimal marginal productivity index policy for the dynamic (single) product promotion problem with closed-form indices that yield estructural insights. The performance of the proposed policy for KPPI is investigated in a computational study.Dynamic promotion, Perishable items, Index policies, Knapsack problem, Festless bandits, Finite horizon, Marginal productivity index

    Conservation Laws, Extended Polymatroids and Multi-Armed Bandit Problems; A unified Approach to Indexabel Systems

    Get PDF
    We show that if performance measures in stochastic and dynamic scheduling problems satisfy generalized conservation laws, then the feasible space of achievable performance is a polyhedron called an extended polymatroid that generalizes the usual polymatroids introduced by Edmonds. Optimization of a linear objective over an extended polymatroid is solved by an adaptive greedy algorithm, which leads to an optimal solution having an indexability property (indexable systems). Under a certain condition, then the indices have a stronger decomposition property (decomposable systems). The following classical problems can be analyzed using our theory: multi-armed bandit problems, branching bandits. multiclass queues, multiclass queues with feedback, deterministic scheduling problemls. Interesting consequences of our results include: (1) a characterization of indexable systems as systems that satisfy generalized conservation laws, (2) a. sufficient condition for idexable systems to be decomposable, (3) a new linear programming proof of the decomposability property of Gittins indices in multi-armed bandit problems, (4) a unified and practical approach to sensitivity analysis of indexable systems, (5) a new characterization of the indices of indexable systems as sums of dual variables and a new interpretation of the indices in terms of retirement options in the context of branching bandits, (6) the first rigorous analysis of the indexability of undiscounted branching bandits, (7) a new algorithm to compute the indices of indexable systems (in particular Gittins indices), which is as fast as the fastest known algorithm, (8) a unification of the algorithm of Klimov for multiclass queues and the algorithm of Gittins for multi-armed bandits as special cases of the same algorithm. (9) closed form formulae for the performance of the optimal policy, and (10) an understanding of the nondependence of the indices on some of the parameters of the stochastic schediiuling problem. Most importantly, our approach provides a unified treatment of several classical problems in stochastic and dynamic scheduling and is able to address in a unified way their variations such as: discounted versus undiscounted cost criterion, rewards versus taxes. preemption versus nonpreemption, discrete versus continuous time, work conserving versus idling policies, linear versus nonlinear objective functions

    Restless Bandits, Linear Programming Relaxations and a Primal-Dual Heuristic

    Get PDF
    We propose a mathematical programming approach for the classical PSPACE - hard problem of n restless bandits in stochastic optimization. We introduce a series of n increasingly stronger linear programming relaxations, the last of which is exact and corresponds to the formulation of the problem as a Markov decision process that has exponential size, while other relaxations provide bounds and are efficiently solvable. We also propose a heuristic for solving the problem that naturally arises from the first of these relaxations and uses indices that are computed through optimal dual variables from the first relaxation. In this way we propose a policy and a suboptimality guarantee. We report computational results that suggest that the value of the proposed heuristic policy is extremely close to the optimal value. Moreover, the second order relaxation provides strong bounds for the optimal solution value

    Effects of hospital facilities on patient outcomes after cancer surgery: an international, prospective, observational study

    Get PDF
    Background Early death after cancer surgery is higher in low-income and middle-income countries (LMICs) compared with in high-income countries, yet the impact of facility characteristics on early postoperative outcomes is unknown. The aim of this study was to examine the association between hospital infrastructure, resource availability, and processes on early outcomes after cancer surgery worldwide.Methods A multimethods analysis was performed as part of the GlobalSurg 3 study-a multicentre, international, prospective cohort study of patients who had surgery for breast, colorectal, or gastric cancer. The primary outcomes were 30-day mortality and 30-day major complication rates. Potentially beneficial hospital facilities were identified by variable selection to select those associated with 30-day mortality. Adjusted outcomes were determined using generalised estimating equations to account for patient characteristics and country-income group, with population stratification by hospital.Findings Between April 1, 2018, and April 23, 2019, facility-level data were collected for 9685 patients across 238 hospitals in 66 countries (91 hospitals in 20 high-income countries; 57 hospitals in 19 upper-middle-income countries; and 90 hospitals in 27 low-income to lower-middle-income countries). The availability of five hospital facilities was inversely associated with mortality: ultrasound, CT scanner, critical care unit, opioid analgesia, and oncologist. After adjustment for case-mix and country income group, hospitals with three or fewer of these facilities (62 hospitals, 1294 patients) had higher mortality compared with those with four or five (adjusted odds ratio [OR] 3.85 [95% CI 2.58-5.75]; p<0.0001), with excess mortality predominantly explained by a limited capacity to rescue following the development of major complications (63.0% vs 82.7%; OR 0.35 [0.23-0.53]; p<0.0001). Across LMICs, improvements in hospital facilities would prevent one to three deaths for every 100 patients undergoing surgery for cancer.Interpretation Hospitals with higher levels of infrastructure and resources have better outcomes after cancer surgery, independent of country income. Without urgent strengthening of hospital infrastructure and resources, the reductions in cancer-associated mortality associated with improved access will not be realised

    Optimization of Multiclass Queueing Networks with Changeover Times via the Achievable Region Approach: Part II, The Multi-Station Case

    No full text
    this paper we address the performance optimization problem in multi-station MQNETs with changeover times by means of the achievable region approach, with the objective of developing a systematic method for computing performance bounds and designing scheduling policies that nearly optimize performance objectives. We have investigated the corresponding problem for single-station MQNETs in a companion paper (see Bertsimas and Nino-Mora 1999
    corecore